Model Selection

Multimodal Vision-Language Model

# Multimodal Vision-Language Model

Internvl3 8B Bf16

InternVL3-8B-bf16 is a vision-language model based on MLX format conversion, supporting multilingual image-to-text tasks.

Transformers Other

Llama 4 Scout 17B 16E 8bit

This is an MLX format model converted from Meta's Llama-4-Scout-17B-16E model, supporting multilingual and vision-language tasks.

Transformers Supports Multiple Languages

Qwen2.5vl 3B VLM R1 REC 500steps

A vision-language model based on Qwen2.5-VL-3B-Instruct, enhanced with VLM-R1 reinforcement learning, focusing on referring expression comprehension tasks.

Safetensors English

Eagle2 is a high-performance series of vision-language models focused on enhancing model performance through optimized data strategies and training methods. Eagle2-9B is the large model in this series, achieving a good balance between performance and inference speed.

Transformers Other

KnutJaegersberg

Eagle2-9B is the latest Vision-Language Model (VLM) released by NVIDIA, achieving a perfect balance between performance and inference speed. It is built on the Qwen2.5-7B-Instruct language model and the Siglip+ConvNext vision model, supporting multilingual and multimodal tasks.

Transformers Other

Vitamin XL 256px

ViTamin-XL-256px is a vision-language model based on the ViTamin architecture, designed for efficient visual feature extraction and multimodal tasks, supporting high-resolution image processing.

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase